【教程】基於 Ansible 部署企業級高可用 K8S 集羣
這是一篇開發文件, 面向開發人員以及 AI, 轉載自我的文件站, 原文地址:
本文的開發環境為 Linux 系統, 使用 micro cli 來編輯檔案, 請根據自身系統環境進行調整
基本概念
關於 Ansible
Ansible 是無代理的自動化工具,把配置與變更寫成清晰、可重複的任務。它擅長跨多臺主機做一致化配置,也適合做應用部署與批量操作。配合負載均衡器時,可把複雜變更拆成可控的滾動步驟。
Ansible 非常非常適合用於部署與管理 HAProxy ~

關於 Kubernetes 與 RKE2
Kubernetes(K8s) 是容器編排系統,負責調度、服務發現、滾動更新與故障自癒等核心能力。它的目標是把分散式應用的執行方式標準化,讓運維流程更可控。
RKE2(RKE Government) 是 Rancher 提供的 Kubernetes 發行版,符合一致性標準,預設更偏向安全與合規,適合生產環境。

關於 Rocky Linux 與 SELinux
Rocky Linux 是開源的企業級作業系統,目標是與 RHEL 保持缺陷級相容,生命週期穩定,適合長期執行的生產集羣。

SELinux 是強制存取控制(MAC)機制,用於精細限制程序與資源的存取邊界。Rocky Linux 預設啟用並處於 enforcing 模式,建議按策略配置而不是關閉。

入門
安裝 Ansible
安裝 Ansible (以 yay 為例):
yay -S ansible
執行 ansible --version 可以查看版本資訊.
yun@yun ~/V/a/yunzaixi-dev (main)> ansible --version
ansible [core 2.20.0]
config file = None
configured module search path = ['/home/yun/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.13/site-packages/ansible
ansible collection location = /home/yun/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.13.7 (main, Aug 16 2025, 15:55:01) [GCC 15.2.1 20250813] (/usr/bin/python)
jinja version = 3.1.6
pyyaml version = 6.0.3 (with libyaml v0.2.5)
Ansible 是基於 Python 實作的,因此安裝 Ansible 前請確保你的開發環境裡已經配置好 Python 環境
lablabs.rke2依賴netaddrPython 套件,需額外安裝. Arch Linux 可用sudo pacman -S python-netaddr.
安裝版本管理工具
安裝 git, gh (以 yay 為例):
yay -S git github-cli
執行 git version 與 gh version 可以查看版本資訊.
yun@yun ~/V/a/yunzaixi-dev (main)> git version
git version 2.52.0
yun@yun ~/V/a/yunzaixi-dev (main)> gh version
gh version 2.83.1 (2025-11-13)
https://github.com/cli/cli/releases/tag/v2.83.1
登入 Github :
gh auth login --scopes workflow
根據提示操作即可.
準備雲端伺服器
在一切開始之前,我們需要先準備用於部署集羣的雲端伺服器, 最小可用的生產級 HA(控制面 + etcd)通常是 3 臺 rke2-serve(嵌入式 etcd)加上至少一臺 rke2-agent , 因此我們至少需要 4 臺雲端伺服器才能進行接下來的步驟
為了方便運維,所有系統統一為 RockyLinux
選擇 RockyLinux 的原因: 它是一個開源免費的企業級作業系統, 百分之百相容 RHEL, 且位於 RKE2 的支援矩陣中
RKE2 非常輕量,但有一些最低要求:
- 兩個 RKE2 節點不能具有相同的節點名稱。預設情況下,節點名稱取自機器的 hostname, 因此 linux 雲端伺服器 hostname 不能相同
- 每臺雲端伺服器應至少具有 2 Core CPU,4 GB RAM,並使用 SSD 作為硬碟
- 開放防火牆特定連接埠
配置 SSH Config
新增如下程式碼到您的系統 SSH Config 中 ( HostName 處填寫雲端伺服器的公網IP地址) :
Host rke2-server1
HostName <你的公網IP地址1>
User root
Host rke2-server2
HostName <你的公網IP地址2>
User root
Host rke2-server3
HostName <你的公網IP地址3>
User root
Host rke2-agent1
HostName <你的公網IP地址4>
User root
Host rke2-agent2
HostName <你的公網IP地址5>
User root
上述程式碼為所有雲端伺服器配置了ssh別名,這極大地簡化了未來的運維操作,接下來上傳ssh公鑰到目標伺服器上:
ssh-copy-id rke2-server1
ssh-copy-id rke2-server2
ssh-copy-id rke2-server3
ssh-copy-id rke2-agent1
ssh-copy-id rke2-agent2
如果之前重裝過系統,你或許需要先清理 SSH 指紋:
ssh-keygen -R rke2-server1
ssh-keygen -R rke2-server2
ssh-keygen -R rke2-server3
ssh-keygen -R rke2-agent1
ssh-keygen -R rke2-agent2
根據提示操作即可.
完成後,即可免密碼登入所有雲端伺服器:
ssh rke2-server1
ssh rke2-server2
ssh rke2-server3
ssh rke2-agent1
ssh rke2-agent2
登入後提示, 沒有使用抗量子加密演算法未來會被駭客幹掉 (那很戰未來了) ,這個不管
** WARNING: connection is not using a post-quantum key exchange algorithm.
** This session may be vulnerable to "store now, decrypt later" attacks.
** The server may need to be upgraded. See https://openssh.com/pq.html
Last failed login: ~~ from ~~ on ssh:notty There were 31 failed login attempts since the last successful login.
初始化 Ansible 專案
初始化倉庫
首先建立資料夾,假設專案名為 rke2-ansible
yun@yun ~/V/a/y/p/ansible (main)> mkdir rke2-ansible
yun@yun ~/V/a/y/p/ansible (main)> ls
rke2-ansible/
進入專案倉庫,初始化 git, 建立 github 公共倉庫:
cd rke2-ansible
git init
echo "# rke2-ansible" > README.md
git add .
git commit -m "chore: initial commit"
gh repo create rke2-ansible --private --source=. --remote=origin --push
下面這段程式碼是可選的,用於將新建的程式碼倉庫宣告為子倉庫:
cd ..
rm -rf rke2-ansible/
git submodule add https://github.com/yunzaixi-dev/rke2-ansible.git ./rke2-ansible
規劃目錄結構
接下來劃分專案結構:
mkdir -p inventories/prod \
group_vars \
host_vars \
playbooks \
roles
建立空檔案:
touch ansible.cfg \
requirements.yml \
inventories/prod/hosts.yml \
group_vars/all.yml \
group_vars/rke2_servers.yml \
group_vars/rke2_agents.yml \
host_vars/rke2-server1.yml \
playbooks/site.yml \
playbooks/ping.yml \
playbooks/update-packages.yml \
playbooks/set-hostname.yml \
playbooks/disable-ssh-password.yml
目錄結構如下:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> tree
.
├── ansible.cfg
├── group_vars
│ ├── all.yml
│ ├── rke2_agents.yml
│ └── rke2_servers.yml
├── host_vars
│ └── rke2-server1.yml
├── inventories
│ └── prod
│ └── hosts.yml
├── playbooks
│ ├── disable-ssh-password.yml
│ ├── ping.yml
│ ├── site.yml
│ ├── update-packages.yml
│ └── set-hostname.yml
├── README.md
├── requirements.yml
└── roles
各目錄與檔案說明:
ansible.cfg: Ansible 全域配置,指定 inventory 與 roles_path.requirements.yml: Galaxy 依賴清單,用於安裝lablabs.rke2角色.inventories/prod/hosts.yml: 生產環境主機清單與分組.group_vars/*.yml: 主機組變數,分別用於集羣公共參數與 server/agent.host_vars/rke2-server1.yml: 單機變數,用於宣告首個控制面初始化.playbooks/site.yml: 部署入口,包含系統準備與 RKE2 安裝流程.playbooks/ping.yml: 連通性檢查 Playbook,用於驗證主機可達.playbooks/update-packages.yml: 批次更新 Playbook,用於升級系統套件.playbooks/set-hostname.yml: 批次設置 hostname,保留-並清理非法字元.playbooks/disable-ssh-password.yml: 關閉 SSH 密碼登入,僅允許金鑰登入.roles/: Galaxy 下載的角色目錄.
安裝 Galaxy Role
micro requirements.yml :
roles:
- name: lablabs.rke2
version: "1.49.1"
lablabs.rke2是社羣維護的 RKE2 Role,Github倉庫地址: https://github.com/lablabs/ansible-role-rke2, 封裝了官方安裝指令碼與服務管理邏輯。固定到1.49.1可確保部署過程可複現,降低上游更新帶來的不確定性。
安裝依賴:
ansible-galaxy role install -r requirements.yml -p roles
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-galaxy role install -r requirements.yml -p
roles
Starting galaxy role install process
- downloading role 'rke2', owned by lablabs
- downloading role from https://github.com/lablabs/ansible-role-rke2/archive/1.49.1.tar.gz
- extracting lablabs.rke2 to /home/yun/Vaults/admin/yunzaixi-dev/project/ansible/rke2-ansible/roles/lablabs.rke2
- lablabs.rke2 (1.49.1) was installed successfully
配置 Ansible
micro ansible.cfg ( interpreter_python 路徑根據自身情況調整):
[defaults]
inventory = inventories/prod/hosts.yml
remote_user = root
host_key_checking = False
roles_path = ./roles
forks = 10
timeout = 30
deprecation_warnings = False
stdout_callback = default
result_format = yaml
interpreter_python = /usr/bin/python3
編寫 inventory
micro inventories/prod/hosts.yml :
all:
children:
rke2_servers:
hosts:
rke2-server1:
rke2-server2:
rke2-server3:
rke2_agents:
hosts:
rke2-agent1:
rke2-agent2:
rke2_cluster:
children:
rke2_servers:
rke2_agents:
由於前面已經配置了 SSH Config , 此處可直接使用主機別名, 無需額外填寫
ansible_host
連通性檢查
micro playbooks/ping.yml :
- name: Ping all hosts
hosts: all
gather_facts: false
tasks:
- name: Ping
ansible.builtin.ping:
執行:
ansible-playbook playbooks/ping.yml
輸出如下:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-playbook playbooks/ping.yml
PLAY [Ping all hosts] ***********************************************************************
TASK [Ping] *********************************************************************************
ok: [rke2-agent1]
ok: [rke2-agent2]
ok: [rke2-server2]
ok: [rke2-server1]
ok: [rke2-server3]
PLAY RECAP **********************************************************************************
rke2-agent1 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-agent2 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server1 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server2 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server3 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
批次設置主機名
hostname 不能包含
_
micro playbooks/set-hostname.yml :
- name: Set hostname from SSH alias
hosts: all
become: true
vars:
raw_hostname: "{{ inventory_hostname | lower }}"
hostname_from_alias: "{{ raw_hostname | regex_replace('[^a-z0-9-]', '') | regex_replace('^-+', '') | regex_replace('-+$', '') }}"
tasks:
- name: Ensure hostname is not empty
ansible.builtin.assert:
that:
- hostname_from_alias | length > 0
fail_msg: "Derived hostname is empty. Check inventory_hostname: {{ inventory_hostname }}"
- name: Set hostname
ansible.builtin.hostname:
name: "{{ hostname_from_alias }}"
執行:
ansible-playbook playbooks/set-hostname.yml
結果如下:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-playbook playbooks/set-hostname.yml
PLAY [Set hostname from SSH alias] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-server3]
ok: [rke2-server2]
ok: [rke2-server1]
ok: [rke2-agent2]
ok: [rke2-agent1]
TASK [Ensure hostname is not empty] *********************************************************
ok: [rke2-server1] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [rke2-server2] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [rke2-server3] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [rke2-agent1] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [rke2-agent2] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [Set hostname] *************************************************************************
changed: [rke2-agent1]
changed: [rke2-server1]
changed: [rke2-server3]
changed: [rke2-server2]
changed: [rke2-agent2]
PLAY RECAP **********************************************************************************
rke2-agent1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-agent2 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server2 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server3 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
停用 SSH 密碼登入 (可選)
執行前請確認已配置金鑰登入,避免被鎖在伺服器外.
micro playbooks/disable-ssh-password.yml :
- name: Disable SSH password authentication
hosts: all
become: true
tasks:
- name: Write SSH hardening config
ansible.builtin.copy:
dest: /etc/ssh/sshd_config.d/99-disable-password.conf
mode: "0644"
content: |
PasswordAuthentication no
KbdInteractiveAuthentication no
ChallengeResponseAuthentication no
notify: Restart sshd
- name: Validate sshd config
ansible.builtin.command: sshd -t
changed_when: false
handlers:
- name: Restart sshd
ansible.builtin.service:
name: sshd
state: restarted
執行:
ansible-playbook playbooks/disable-ssh-password.yml
輸出如下:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-playbook playbooks/disable-ssh-password.yml
PLAY [Disable SSH password authentication] **************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-agent1]
ok: [rke2-server3]
ok: [rke2-agent2]
ok: [rke2-server1]
ok: [rke2-server2]
TASK [Write SSH hardening config] ***********************************************************
changed: [rke2-server3]
changed: [rke2-agent1]
changed: [rke2-server2]
changed: [rke2-server1]
changed: [rke2-agent2]
TASK [Validate sshd config] *****************************************************************
ok: [rke2-server3]
ok: [rke2-agent1]
ok: [rke2-server2]
ok: [rke2-agent2]
ok: [rke2-server1]
RUNNING HANDLER [Restart sshd] **************************************************************
changed: [rke2-server2]
changed: [rke2-server3]
changed: [rke2-server1]
changed: [rke2-agent2]
changed: [rke2-agent1]
PLAY RECAP **********************************************************************************
rke2-agent1 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-agent2 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server1 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server2 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server3 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
批次更新系統套件並重啟 (建議)
適用於已在 Rocky Linux 9 上,僅需更新系統套件的場景. 如無需重啟,將
reboot_after_update設為false.
micro playbooks/update-packages.yml :
- name: Update Rocky Linux packages
hosts: all
become: true
serial: 1
vars:
reboot_after_update: true
tasks:
- name: Update package metadata
ansible.builtin.dnf:
update_cache: true
- name: Upgrade all packages
ansible.builtin.dnf:
name: "*"
state: latest
- name: Remove unneeded packages
ansible.builtin.dnf:
autoremove: true
- name: Clean package cache
ansible.builtin.command: dnf clean all
changed_when: false
- name: Reboot after update (optional)
ansible.builtin.reboot:
reboot_timeout: 3600
when: reboot_after_update
執行:
ansible-playbook playbooks/update-packages.yml
輸出如下:
yun@yun ~/V/a/y/p/a/rke2-ansible (master)> ansible-playbook playbooks/update-packages.yml
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-server1]
TASK [Update package metadata] **************************************************************
ok: [rke2-server1]
TASK [Upgrade all packages] *****************************************************************
ok: [rke2-server1]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-server1]
TASK [Clean package cache] ******************************************************************
ok: [rke2-server1]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-server1]
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-server2]
TASK [Update package metadata] **************************************************************
ok: [rke2-server2]
TASK [Upgrade all packages] *****************************************************************
changed: [rke2-server2]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-server2]
TASK [Clean package cache] ******************************************************************
ok: [rke2-server2]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-server2]
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-server3]
TASK [Update package metadata] **************************************************************
ok: [rke2-server3]
TASK [Upgrade all packages] *****************************************************************
changed: [rke2-server3]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-server3]
TASK [Clean package cache] ******************************************************************
ok: [rke2-server3]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-server3]
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-agent1]
TASK [Update package metadata] **************************************************************
ok: [rke2-agent1]
TASK [Upgrade all packages] *****************************************************************
changed: [rke2-agent1]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-agent1]
TASK [Clean package cache] ******************************************************************
ok: [rke2-agent1]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-agent1]
PLAY [Update Rocky Linux packages] **********************************************************
TASK [Gathering Facts] **********************************************************************
ok: [rke2-agent2]
TASK [Update package metadata] **************************************************************
ok: [rke2-agent2]
TASK [Upgrade all packages] *****************************************************************
changed: [rke2-agent2]
TASK [Remove unneeded packages] *************************************************************
ok: [rke2-agent2]
TASK [Clean package cache] ******************************************************************
ok: [rke2-agent2]
TASK [Reboot after update (optional)] *******************************************************
changed: [rke2-agent2]
PLAY RECAP **********************************************************************************
rke2-agent1 : ok=6 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-agent2 : ok=6 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server1 : ok=6 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server2 : ok=6 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rke2-server3 : ok=6 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
部署 RKE2
編寫 RKE2 變數
lablabs.rke2的rke2_config是模板路徑(預設templates/config.yaml.j2),不要寫成字典.需要寫入config.yaml的參數請放到rke2_server_options/rke2_agent_options中.
micro group_vars/all.yml :
rke2_cluster_group_name: "rke2_cluster"
rke2_servers_group_name: "rke2_servers"
rke2_agents_group_name: "rke2_agents"
rke2_channel: "latest"
rke2_version: "v1.34.2+rke2r1"
rke2_token: "CHANGE_ME"
rke2_api_ip: "<LB或server1>"
rke2_additional_sans:
- "<LB或server1>"
rke2_selinux: true
rke2_cni:
- cilium
rke2_token是集羣註冊用的共享金鑰,所有節點必須一致.rke2_api_ip是控制面入口地址: 有 LB/VIP 就填 LB/VIP 的 IP 或網域名稱,無 LB/VIP 且每臺機器僅有固定單 IP 時可以填首個控制面(如rke2-server1)的 IP/網域名稱,並把該值同步加入rke2_additional_sans. 這種配置等同於把 API 固定到單節點,控制面入口不具備高可用,建議生產使用 LB/VIP.rke2_token可用openssl rand -base64 32生成. Rocky Linux 預設啟用 SELinux 時,務必設置rke2_selinux: true,並確保安裝container-selinux. 使用 Cilium 時將rke2_cni指向cilium.
micro group_vars/rke2_servers.yml :
rke2_server_options:
- write-kubeconfig-mode: "0644"
micro group_vars/rke2_agents.yml :
rke2_agent_options:
- node-ip: "{{ ansible_default_ipv4.address }}"
將首個控制面標記為初始化節點,micro host_vars/rke2-server1.yml :
rke2_server_options:
- write-kubeconfig-mode: "0644"
- cluster-init: true
編寫 Playbook
micro playbooks/site.yml :
- name: Base setup
hosts: all
become: true
tasks:
- name: Install base packages
ansible.builtin.package:
name:
- curl
- tar
- socat
- conntrack
- iptables
- container-selinux
state: present
- name: Disable swap
ansible.builtin.command: swapoff -a
when: ansible_swaptotal_mb | int > 0
changed_when: false
- name: Remove swap from fstab
ansible.builtin.replace:
path: /etc/fstab
regexp: '^(.*\\sswap\\s.*)$'
replace: '# \\1'
- name: Load br_netfilter
ansible.builtin.modprobe:
name: br_netfilter
state: present
- name: Enable sysctl for Kubernetes
ansible.builtin.sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: true
loop:
- { name: net.bridge.bridge-nf-call-iptables, value: 1 }
- { name: net.bridge.bridge-nf-call-ip6tables, value: 1 }
- { name: net.ipv4.ip_forward, value: 1 }
- name: RKE2 servers
hosts: rke2_servers
become: true
serial: 1
roles:
- role: lablabs.rke2
- name: RKE2 agents
hosts: rke2_agents
become: true
roles:
- role: lablabs.rke2
部署與驗證
執行部署
先做一次語法檢查:
ansible-playbook playbooks/site.yml --syntax-check
執行部署:
ansible-playbook playbooks/site.yml
獲取 kubeconfig
登入任意控制面節點並匯出 kubeconfig:
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
rke2 kubectl get nodes -o wide
如果在本地使用 kubectl,可以拷貝 kubeconfig:
mkdir -p ~/.kube
scp rke2-server1:/etc/rancher/rke2/rke2.yaml ~/.kube/rke2.yaml
sed -i 's/127.0.0.1/<LB或server1>/g' ~/.kube/rke2.yaml
export KUBECONFIG=~/.kube/rke2.yaml
kubectl get nodes -o wide
至此,最小高可用 RKE2 集羣部署完成.